{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](../images/logo2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Matplotlib is a Python visualization library producing publication quality figures in a variety of hardcopy formats and interactive environments. Matplotlib can be used in Python scripts, the Python and IPython shell, web application servers, and various graphical user interface toolkits."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# tutorial: http://bit.ly/scipympl19"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get to know the library!\n",
    "1. Go to http://bit.ly/mpl_gallery\n",
    "2. Pick a visualization you'd like to learn how to create\n",
    "3. Turn to your neighbor & discuss w/ each other why you want to learn to make it\n",
    "4. Post your images to the slack! \n",
    "5. optional: tweet your faves & tag @matplotlib and #scipy2019 ;)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Help!\n",
    "\n",
    "During this tutorial, please flag down Hannah, Tom (whoever isn't teaching at the moment) or Kimberly. \n",
    "\n",
    "The easiest way to do this is to post in the Scipy2019 #matplotlib slack channel \n",
    "\n",
    "__stickies__: Put a sticky note on your laptop. If you've used the flags before, we don't have enough of the orange so in this class all stickies mean please help!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### After the tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To follow up on the material discussed in this tutorial:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Documentation__:\n",
    "* https://matplotlib.org/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Mailing lists__:\n",
    "* [User](https://mail.python.org/mailman/listinfo/matplotlib-users): matplotlib-users@python.org\n",
    "* [Announcement](https://mail.python.org/mailman/listinfo/matplotlib-announce): matplotlib-announce@python.org\n",
    "* [Development](https://mail.python.org/mailman/listinfo/matplotlib-devel): matplotlib-devel@python.org"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Social__:\n",
    "* twitter: [@matplotlib](https://twitter.com/matplotlib)\n",
    "* gitter chat: https://gitter.im/matplotlib/matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What do you need for this tutorial?\n",
    "Installation instructions can be found [here](installation.md). We will be using __Python 3__.  We will also be using the [Pandas](https://pandas.pydata.org/) data analysis and the [NumPy](https://www.numpy.org/) numerical analysis libraries to load in and process much of the data that we are trying to visualize. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import libraries and check versions\n",
    "To use a library in Python, we need to first import it. In this code block, we also print the version of the libraries we are importing for reproducibility. Sometimes minor changes in the libraries between versions will cause code to behave unexpectedly - for example the images you produce may look slightly different from the ones in this tutorial. We are using Python [format strings](https://docs.python.org/3.4/library/string.html#string-formatting) for the printing.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib\n",
    "print(f'Matplotlib: {matplotlib.__version__}')\n",
    "import matplotlib.pyplot as plt # load in matplotlib plotting tools\n",
    "import pandas as pd # rename as pd by convention\n",
    "print(f\"pandas: {pd.__version__}\")\n",
    "import numpy as np  # rename as np by convention\n",
    "print(f\"numpy: {np.__version__}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get familiar with the titanic dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this tutorial we are using the [Kaggle Titanic dataset](https://www.kaggle.com/c/titanic/data) because it has a mix of quantitative and categorical variables and is well suited to data exploration. In this tutorial, we will explore the demographics of passengers on the Titanic. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| Variable | \tDefinition | \tKey |\n",
    "|-----------:|-------------:|-------:|\n",
    "| survival | \tSurvival \t| 0 = No, 1 = Yes |\n",
    "| pclass | Ticket class \t| 1 = 1st, 2 = 2nd, 3 = 3rd |\n",
    "| sex \t| sex | | \t\n",
    "| age \t| age in years \t | | \n",
    "| sibsp |\t# of siblings / spouses on board \t| |\n",
    "| parch |\t# of parents / children on board  | |\t\n",
    "| ticket| \tTicket number ||\n",
    "| fare  |\tPassenger fare \t||\n",
    "| cabin |\tCabin number \t||\n",
    "| embarked |Port of Embarkation | \tC = Cherbourg, Q = Queenstown, S = Southampton|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__pclass:__ A proxy for socio-economic status (SES)\n",
    "* 1st = Upper\n",
    "* 2nd = Middle\n",
    "* 3rd = Lower\n",
    "\n",
    "__age:__ Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5\n",
    "\n",
    "__sibsp:__ The dataset defines family relations in this way...\n",
    "* Sibling = brother, sister, stepbrother, stepsister\n",
    "* Spouse = husband, wife (mistresses and fiancés were ignored)\n",
    "\n",
    "__parch:__ The dataset defines family relations in this way...\n",
    "* Parent = mother, father\n",
    "* Child = daughter, son, stepdaughter, stepson\n",
    "* Some children travelled only with a nanny, therefore parch=0 for them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You are welcome to download a local copy from http://bit.ly/tcsv19. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"http://bit.ly/tcsv19\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the `.columns` method of  print the columns in our dataframe so that we have a reference when trying to access this data throughout this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's test our install\n",
    "\n",
    "Here we open the Titanic dataset via url and plot the sorted ages of the passengers. We select the ages from our dataframe (spreadsheet) using `df['age']`, and use numpy's sort because it can handle the missing values in our age column. We use `%matplotlib inline` to tell jupyter to show the matplotlib images. We will unpack the figure generating code in the next couple of notebooks, but basically `fig, ax` creates the area to plot on, and `ax.plot` draws the scatter plot. `_` is used for assignment variables we don't care about, and here specifically we also use it to suppress output. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "fig, ax = plt.subplots()\n",
    "_ = ax.plot(np.sort(df['age']), marker='o', markersize=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lightning notebook introduction!\n",
    "\n",
    " - notebooks support tab completion!  In the above cell if we typed `ax.pl<TAB>` we would get a list of possible completion\n",
    " - you can use `?` to get a function's documentation string, which is how the function is documented inside the source code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "? ax.plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
